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Abstract 

Objective: ^sr\y recognition and treatment of febrile children with serious infections (SI) improves prognosis, however, early 
detection can be difficult. We aimed to validate the predictive rule-in value of the National Institute for Health and Clinical 
Excellence (NICE) most severe alarming signs or symptoms to identify SI in children. 

Design, Setting and Participants: Ihe 16 most severe ("red") features of the NICE traffic light system were validated in 
seven different primary care and emergency department settings, including 6,260 children presenting with acute illness. 

IVIain Outcome Measures: We focussed on the individual predictive value of single red features for SI and their 
combinations. Results were presented as positive likelihood ratios, sensitivities and specificities. We categorised "general" 
and "disease-specific" red features. Changes in pre-test probability versus post-test probability for SI were visualised in 
Fagan nomograms. 

Results: Almost all red features had rule-in value for SI, but only four individual red features substantially raised the 
probability of SI in more than one dataset: "does not wake/stay awake", "reduced skin turgor", "non-blanching rash", and 
"focal neurological signs". The presence of >3 red features improved prediction of SI but still lacked strong rule-in value as 
likelihood ratios were below 5. 

Conclusions:JUe rule-in value of the most severe alarming signs or symptoms of the NICE traffic light system for identifying 
children with SI was limited, even when multiple red features were present. Our study highlights the importance of 
assessing the predictive value of alarming signs in clinical guidelines prior to widespread implementation in routine 
practice. 
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introduction 

Fever is one of the most common symptoms among children 
presenting to ambulatory care. [1-3] The majority of children 
presenting with an acute illness to ambulatory care will have self- 
limiting viral infections, with only a small proportion having a 
serious infection (SI). [1,4-6] Early recognition and treatment of 
children with SI are related to better prognosis, [7,8] however 
identification of SI at first presentation can be difficult. 

The National Institute for Health and Clinical Excellence 
(NICE) 2013 guideline for the management of children with 
feverish illness provides comprehensive guidance on the assess- 
ment, investigation and management of children presenting at 



difierent settings, including primary care and pediatric specialty 
settings. [6,9] One of the key elements of the guideline is a "trafiic 
light" system for the diagnostic assessment of children under five 
years of age presenting with a feverish illness. This evidence and 
consensus-based system includes clinical features identified from 
existing scoring systems for acutely ill children, [10- 13] and 
disease-specific signs and symptoms. Children with the most 
alarming (or "red") features are considered at higher risk of SI, for 
whom subsequent management includes invasive investigations, 
treatment, and hospital admission. 

As one of the few evidence-based gxiidelines for children with 
fever [14,15] and the only for both primary and secondary care, 
the NICE febrile child guideline has been implemented in many 
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settings in not only the United Kingdom but also in other 
countries. Recently, two studies reported low specificities for the 
approach that any abnormal amber or red feature would indicate 
possible SI. [16, 17] This could be due to the inclusion of amber 
features, whose association with SI may be weaker. 

In this study we aimed to determine the predictive ("rule-in") 
value of the red features of the NICE traffic light system, both for 
the individual red features as their combinations for identifying 
children with SI in various acute pediatric settings in Europe. 

Methods 

Identification of datasets 

We used data on seven independent cohorts [4,18-23] collected 
by collaborators of the European Research Network on recognis- 
ing serious InfEctions (ERNIE) group. [24] Data were prospec- 
tively collected at first contact using standardised (site-specific) 
documentation of patient characteristics, except for Monteny et al 
[19] where data was collected using structured chnical proformas 
separate from the consultation. All datasets were cohort studies of 
children in various age ranges (0—16 years), presenting to 
ambulatory care settings (i.e. general or family practice, pediatric 
outpatient clinic, pediatric assessment unit or emergency depart- 
ment) with an acute illness or infection. 

Two datasets based on primary care settings were considered as 
low prevalence settings of SI (<5%) and five datasets based on 
emergency care settings as high prevalence settings (>5%).[25] 
More details on the original cohorts have been published 
elsewhere ([4,18-23]). 

Ethical approval 

This research conforms to the Helsinki Declaration and to local 
legislation. The original study authors have all agreed to share 
their data, and had obtained ethical approval from their local 
research ethics committees for the initial data collection, prior to 
this study. 

Processing of included datasets 

Key characteristics of each dataset are shown in table 1 . We 
selected children under the age of five years with an acute illness 
based on general symptoms [4,21,22] or specifically on the 
presence of fever [18-20,23], as this is the target group of the 
NICE guideUne (table 1). 

The NICE traffic fight system includes 16 red features, which 
are categorised into 5 main domains: Colour (1 red feature), 
Activity (4 red features). Respiratory (3 red features). Hydration (1 
red feature), and Other (7 red features). [6,9] When study variables 
were not entirely identical to the red features in the NICE febrile 
child guideline, we identified proxies where possible. Identification 
and handling of variables has been described earlier [1 7], a full list 
of all approximations is described in table S 1 . When a red feature 
was not recorded in the dataset and no suitable proxy was 
identified, this item was excluded from that specific dataset. Table 
S2 outiines the unrecorded and missing data from each dataset 
separately. 

Missing values were not imputed because the necessary missing- 
at-random assumption was likely to be incorrect. We considered 
red features that were "not documented" in individual patient 
records as "absent", given that the red feature or its proxy was 
recorded in that particular dataset. [17] 

The translation, recoding and data-checking were performed by 
two authors (EK, JV) and the results of each step were discussed 
with all primary study authors. [17] 



Outcome measures 

Serious infections (SI) were defined as sepsis (including 
bacteremia), meningitis, pneumonia, osteomyelitis, cellulitis, and 
complicated urinary- tract infections. [25] Serious infections (SI) 
were not only based on clinical diagnosis, but reference standard 
test criteria were used to determine final diagnoses of SI. Detailed 
description on these reference standard test criteria are available in 
the original study papers. [4, 18-23] Assessment of the diagnoses to 
ensure comparability of outcomes was discussed with the lead 
investigator of each study as described earlier. [17] 

Statistical analysis 

The individual red features were analysed in every dataset 
separately. Additionally, results were categorised as "general" red 
features (items 1-7 and 9-10) and "disease-specific" red features 
(items 8 and 11-16). 

We assessed the rule-in value for SI for each red feature 
separately by calculating positive likelihood ratios (LR-I-). Red 
features were considered to have rule-in value if they raised the 
probability of illness with a positive likelihood ratio of more than 
5.0. [25] The uni\ariable association between each individual red 
feature and the presence of SI was tested by Chi-square analysis. 
Likelihood ratios, sensitivity and specificity were measured for the 
presence of ^1 RTL, S2 RTLs and S3 RTLs. The sensitivity 
and specificity for "general" and "disease-specific" red features 
were plotted in receiver operating characteristic (ROC) space. 

The incremental diagnostic value for up to more than four red 
features compared to one red feature was evaluated by logistic 
regression analyses with forward selection (Wald test, p-value 
<0.05). 

We visualised the change in pre-test probabiUty versus post-test 
probability for SI in a Fagan nomogram. [26] 

No overall pooled likelihood ratios were calculated because of 
the substantial clinical heterogeneity between datasets (differences 
in setting, inclusion criteria, immunisation schedules and definition 
of serious infection). [17] AU analyses were done with SPSS 
software (version 20.0, SPSS Inc, Chicago). 

Results 

Included datasets 

We selected 6,260 children under fi\'e years of age of seven pre- 
existing datasets (n = 6,260/10,812, 58%) for diagnostic studies in 
children with an acute illness (table 1). Children were included 
based on fever, [19,20,23] acute illness, [4, 18] acute infection, [21] 
and referral for meningeal signs. [22] Children with various 
severities of co-morbidit)' ^v(;r(; excluded in five studies,[4, 19- 
23], one study excluded children if the acute episode was caused 
by an exacerbation of a chronic condition [4] and one study 
excluded children who required immediate resuscitation [18] 
(table 1). All studies included sepsis, meningitis, pneumonia and 
complicated urinary tract infections in their outcome definition. 
Osteomyelitis and cellulitis were explicitly mentioned in five and 
three datasets, respectively. 

The median age of the selected children ranged from 0.8 years 
to 1.9 years. The prevalence of SI ranged from 1.2% to 4.1% in 
two datasets from general practice [4,19] and from 9.3% to 40.2% 
in five datasets from emergency departments and a pediatric 
assessment unit [18,20-23]. 

Red traffic lights included in the datasets 

Data on all red features included in domains "Colour" and 
"Hydration" were available in all datasets. The red features "no 
response to social cues", and "weak, high-pitched or continuous 
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cry" of domain "Activity" were not recorded in two [20,23], and 
one dataset [18], respectively. Other red features in this domain 
were available in all datasets. Red features related to the 

"Respiratory" domain were not recorded in four ("grunting") 
[4,21-23], one ("tachypnoea") [22], and two ("chest indrawing") 
[22,23], datasets respectively. "Disease-specific" red features 
(items 8 and 11-16) were recorded less frequently in all datasets 
but in particular in low prevalence settings (range missing values 
0-50%), see table S2). 

Performance of individual red traffic lights 

Table 2 shows positive and negative likelihood ratios of the 16 
individual red features for each dataset separately. AH red features 
with high rule-in \ aluc (LR-H >5) are highlight t-d in bold. 

Four of all 16 red features did not achieve high rule-in value 
(LR+ <5) including two red features which were not a\ ailable in 
the datasets or were not reaching significance (p<0.05) when 
present. 

The one red feature which provided high rule-in value in two 
datasets from both low and higher prevalence settings, was "does 
not wake or if roused does not stay awake" (LR-H5.9 (95% CI 3.5- 
10.0) and LRh-7.8, 95% CI 4.4-13.6, respectively). The red 
features "reduced skin turgor", "non-blanching rash", and "focal 
neurological signs" showed high rule-in value in two high 
prevalence settings each (range LR-H5.0-9.7)[18,20,22]. The red 
features "pale/mottled/ashcn/blu(-", "appears iU to a healthcare 
professional", "weak, high-pitched or continuous cry", "tachy- 
pnoea", "moderate or severe chest indrawing", and "age 0-3 
months & temperature >38°C" showed high rule-in value in one 
low prevalence setting (range LR-H5.9-83.6)[4]. High rule-in value 
for the red features "grunting" and "bulging fontanelle", was 
observed in one high prevalence dataset (range LR+7.8-1 1.3).[20] 
In two high prevalence settings for none of the red features high 
rule-in value was observed. [2 1,23] 

Performance of multiple red traffic ligiits 

The association between SI and the number of positive red 
features with the performance measures of positive likelihood 
ratios, sensitivity and specificity is shown in table 3. We measured 
the maximum predictive value of multiple red features by logistic 
regression analysis and the slope of the ROC-curve. We noted a 
significant increase of rule-in value with the number of positive red 
features in most datasets (range LR-l-2.1 - 10.0 when S3 red 
features), with the exception of Monteny et al.[19] (p-value 
<0.05). This was also observed in the increased values of 
specificity when more red features were present. The presence of 
4 or more red features did not contribute to discriminative value 
compared to up to 3 red features. The proportion of children 
having S3 red features ranged from 2% to 50% and did not differ 
between low and high prevalence settings. "General" red features 
were almost entirely responsible for the total ROC-area (table 3). 
We did not test disease-specific red features on disease-specific 
outcome measures due to the small numbers of these events. In 
figure 1 we visualised the change in pre-test to post-test probability 
for SI when three or more (general or disease-specific) red features 
were present in a Fagan nomogram. [2 7] For example, the 9% pre- 
test probability of having a SI for a child in the Brent et al dataset 
increases to 28% (95% CI 17-42%) post-test probability when 
having three or more red features, but decreases only to 7% (95% 
CI 6-9%) if less than three red features were present. 



Discussion 

Main findings 

This is the first study on broadly validating the diagnostic 
performance of the individual red features and their combinations 
of the NICE febrile child guideline in acutely ill children in various 
settings in Europe. Although we observed rule-in value for almost 
all individual red features in at least one dataset, only four red 
features raised the probability of SI with a positive likelihood ratio 
of more than 5.0 in more than one setting: "does not wake or if 
roused does not stay awake", "reduced skin turgor", "non- 
blanching rash", and "focal neurological signs". Children with 
more than one red feature had an increased risk of SI, however, 
more than three red features did not further increase disease 
probabihty. 

Comparison with other studies 

To our knowledge there are three previous studies that 
estimated the predictive value of any amber or red feature for 
the detection of SI, but they did not evaluate the individual 
features of the NICE traffic light system separately. De et al. [16] 
found that the NICE traffic Ught system failed to identify a 
substantial proportion of children with serious bacterial infections. 
Combining the amber and red feature categories resulted in a 
sensitivity of 85.8% and specificity of 28.5% for the detection of 
any serious bacterial infections. Within the original data of 
Thompson et al. the diagnostic value of vital signs and the NICE 
traffic light system for identifying children with SI was assessed in a 
pediatric assessment unit. [21] They stated that the presence of one 
or more amber and red features was 85% sensitive, but only 29% 
specific in identifying serious or intermediate infections. [21] 
However, this original study was performed in children up to 16 
years of age in contrast to this present study limited to children up 
to 5 years of age. Finally, a previous study assessing the diagnostic 
value of any abnormal amber or red feature (not considering 
combinations) of the NICE traffic light system to rule-out SI, had 
sensitivity of 97-100% in low and intermediate prevalence settings 
and 87—99% in high prevalence settings. [17] The results of all 
three validation studies suggest possible clinical value for ruUng- 
out SI using both amber and red features, but at the expense of a 
large group of children testing false positive. However, up to 15% 
of children with a serious infection will be missed. Alternatively, 
the presence of any amber or red feature does not allow ruling-in 
SI considering the very low specificity. In low prevalence settings, 
alarming signs are preferably highly sensitive to correctiy rule-out 
SI in order to limit incorrect referral. [24] In high prevalence 
settings specificity is more important because a high rate of false 
positive children could result in high admission rates and 
unnecessary investigations. [24] Unfortunately there was too much 
heterogeneity in our datasets to stratify according to prevalence. 

Clinical and research implications 

With decreasing incidence of SI, clinicians may increasingly rely 
on alarming symptoms described in (inter)national clinical 
guidelines. Broad validation could support the wider adoption of 
the NICE guideline in various settings in Europe and other high- 
income countries. Although the traffic light system of the NICE 
febrile child guideline is mostiy based on systematic literature 
reviews and consensus, only four red features achieved high rule-in 
value in more than one dataset and none of them across all 
settings. Moreover, in at least as many datasets these four red 
features did not achieved high rule-in value and therefore hampers 
strong conclusions. 
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Figure 1. Calculation of post-test pobability for serious infections if >3 red traffic lights present using Fagan nomogram. 
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The rule-in value of several other red features was not 
confirmed in multiple settings either, questioning their inclusion 
in this setting-independent traffic light system. 

Our observations of varying rule-in values of red features in the 
7 databases did not support the development of one prediction 
model including the most important red features. However, we 
consistently observed an association between 3 or more red 
features and SI but combinations of red features will never be able 
to defmitely rule-in a SI without uncertainty. This could be due to 
dilution of their accuracy by the inclusion of aspecific red features 
or because of the interaction between difiFerent red features. 

The relatively lower recording of "disease-specific" features 
hampered our analyses, in particular in low prevalence settings. 
This may in part have been caused by the fact that it is more 
difficult to identify proxies for such features, in contrast to more 
general features. 

The main findings in our study corresponds with the limited 
performance of the Yale Observation Scale, on which the NICE 
traffic light system is partly based. [17,25] In the revised 2013 
guideline[9] two red features were deleted of the previous 2007 
protocol'' or transferred to amber features: "Age 3-6m & 
temperature S:39°C" and "bile-stained vomiting". This is 



supported by our findings that we did not find rule-in value for 
the former but only had one dataset available for the latter which 
showed high rule-in value though. Next, as disease specific red 
features are strongly related to specific but rare diseases, their 
positive documentation rate is already expected to be low. 
Although these disease specific red features may be relevant for 
one specific outcome, it is difficult to evaluate these in the general 
population of fever with a broad differential diagnosis. However, 
achieving complete certainty with clinical features is not the goal 
here. Rather, red features should lift the probability of SI over a 
certain decision threshold: either to refer, request additional testing 
or start empiric treatment. As we do not know at what specific risk 
thresholds we (intuitively) undertake action, clinical interpretation 
of post-test probabihties as expressed in Fagan nomograms 
(figure I) remains difficult. As diagnosis assessment is a dynamic 
process and may be influenced by evolution of symptoms in time, 
repeated assessment of deviating red features in those with only 
one or two features in particular, may improve the evaluation of 
SI. 

Finally, the NICE traffic light system could also be improved by 
taking more recent evidence into account, such as on peripheral 
circulation, parental concern [25] or urine analysis [16]. 



PLOS ONE I www.plosone.org 



6 



March 2014 | Volume 9 | Issue 3 | e90847 



Predictive Value of the NICE "Red Traffic Lights" 



3 
K 

m 
z 

UJ 

O 

z ? 

< :j 

> B 



>- 



z 
o 



01 

z 
o 

0. 

o 
I 



< 

Ui 

o 



z 



o 
o 



IN 

II a," 

I " 

Z i7 



S II 



m • 



II 



II « 

S II 
z w 



in m 

II !n 



Z tn 



m rsi m (V) 



£ B 



\o Lo in in 



>- o T- 



m <Ti ^ 



CO m 00 ro 



rsi m >— 
<— m 
^ d 



<- o >- 



rsi CTi >— 



U-1 O 

r-^ CO 

^ d rn 



vO >JD CO 



^ O ^3 O rn 
CTi ro \0 LO m 



m >— 



vo 00 



CO On \D <^ 



CTi ro 00 



O LO 00 



vD 00 0^ 



CO a\ 



^ o >- 



rsi fN Lo in 
d d ^' 



in O ro 00 



<- o >- 



ro >— ^ 

a\ <j\ CO 



CO CO VD 



\o oi 









o 


o 


in 


d 


d 


d 

1 


in 


in 


VD 


d 


d 


d 


s 


s 


u-i 


d 


d 


d 



Q. ■> 'u ■> 'u 
in -ji; '-P it 



in tyi in in 



Q. O 



o3 
£ 



All 



■M -Q 



5 5 



o o ■ 

V o' 

<U := 

D OJ 



i>i* Ai 



Ol ro 
c o 

^ 2 
-O O 
1° 

a> 
■ c 
^ o 

Q. 

11 
II 

C T— 

— 

OJ 

^ o 



'5 



00 U 



PLOS ONE I www.plosone.org 



7 



March 2014 | Volume 9 | Issue 3 | e90847 



Predictive Value of the NICE "Red Traffic Lights' 



Strengths and limitations 

We assessed the NICE red traffic lights in 6,260 children from 
seven existing datasets with various pediatric populations and 
settings including two low prevalence primary care settings, which 
are usually underrepresented in diagnostic studies in this area. [24] 
In addition, we validated the red features separately to identify 
their individual predictive value. 

Despite the large amount of data, not all red features had been 
recorded in all datasets, necessitating the use of proxy var- 
iables. [17] Furthermore, differences in population characteristics 
(table 1), such as age distribution or prevalence of specific 
diagnoses within the group of SI, prevented the calculation of 
overall diagnostic performance measures. 

Furthermore, by assuming missing red features as not present 
and more complete documentation of red features in iU children, 
we may have overestimated our likelihood ratios by increasing the 
contrast between children with and without SI. 

However, the variability in variables and case-mix reflects 
clinical practice and therefore will strengthen generalizability of 
our results. 

Conclusion 

Our results support rule-in value of several individual red 
features from the NICE ft^brile child guideline in specific settings, 
although not consistent. However most features had Kttie rule-in 
value across multiple settings. The NICE red traffic lights, even 
when three or more features are present, seem to have limited 
value for ruling-in serious infections. Our results underline the 
importance to widely validate the predictive value of individual 
and combinations of multiple red features in clinical guidelines, 
prior to widespread dissemination and adoption. 
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